Skip to content

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Feb 23, 2023

What changes were proposed in this pull request?

This pr aims add more types support of sql.functions#lit function, include:

  • Decimal
  • Instant
  • Timestamp
  • LocalDateTime
  • Date
  • Duration
  • Period
  • CalendarInterval

Why are the changes needed?

Make ·sql.functions#lit· function support more types

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Add new test
  • Manual checked new case with Scala-2.13

@LuciferYang LuciferYang marked this pull request as draft February 23, 2023 14:10
@LuciferYang
Copy link
Contributor Author

Some other things to do, will continue tomorrow

@LuciferYang LuciferYang marked this pull request as ready for review February 23, 2023 14:26
}
}

private def literalToColumn(literal: Literal): Column = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this. In fact for the next Spark release we will be removing the Catalyst dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we should remove case v: Literal => literalToColumn(v) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we should. Sorry about that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let me update the code

case v: Array[Byte] => createLiteral(_.setBinary(ByteString.copyFrom(v)))
case v: collection.mutable.WrappedArray[_] => lit(v.array)
case v: LocalDate => createLiteral(_.setDate(v.toEpochDay.toInt))
case v: UTF8String => createLiteral(_.setString(v.toString))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is internal API. Can you remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@hvanhovell
Copy link
Contributor

@LuciferYang thanks for the PR! Which datatypes are we still missing? I think we still some collection support?

@LuciferYang
Copy link
Contributor Author

I refer

def apply(v: Any): Literal = v match {
case i: Int => Literal(i, IntegerType)
case l: Long => Literal(l, LongType)
case d: Double => Literal(d, DoubleType)
case f: Float => Literal(f, FloatType)
case b: Byte => Literal(b, ByteType)
case s: Short => Literal(s, ShortType)
case s: String => Literal(UTF8String.fromString(s), StringType)
case s: UTF8String => Literal(s, StringType)
case c: Char => Literal(UTF8String.fromString(c.toString), StringType)
case ac: Array[Char] => Literal(UTF8String.fromString(String.valueOf(ac)), StringType)
case b: Boolean => Literal(b, BooleanType)
case d: BigDecimal =>
val decimal = Decimal(d)
Literal(decimal, DecimalType.fromDecimal(decimal))
case d: JavaBigDecimal =>
val decimal = Decimal(d)
Literal(decimal, DecimalType.fromDecimal(decimal))
case d: Decimal => Literal(d, DecimalType(Math.max(d.precision, d.scale), d.scale))
case i: Instant => Literal(instantToMicros(i), TimestampType)
case t: Timestamp => Literal(DateTimeUtils.fromJavaTimestamp(t), TimestampType)
case l: LocalDateTime => Literal(DateTimeUtils.localDateTimeToMicros(l), TimestampNTZType)
case ld: LocalDate => Literal(ld.toEpochDay.toInt, DateType)
case d: Date => Literal(DateTimeUtils.fromJavaDate(d), DateType)
case d: Duration => Literal(durationToMicros(d), DayTimeIntervalType())
case p: Period => Literal(periodToMonths(p), YearMonthIntervalType())
case a: Array[Byte] => Literal(a, BinaryType)
case a: collection.mutable.WrappedArray[_] => apply(a.array)
case a: Array[_] =>
val elementType = componentTypeToDataType(a.getClass.getComponentType())
val dataType = ArrayType(elementType)
val convert = CatalystTypeConverters.createToCatalystConverter(dataType)
Literal(convert(a), dataType)
case i: CalendarInterval => Literal(i, CalendarIntervalType)
case null => Literal(null, NullType)
case v: Literal => v
case _ =>
throw QueryExecutionErrors.literalTypeUnsupportedError(v)
}

the missing is case a: Array[_]

@hvanhovell
Copy link
Contributor

hvanhovell commented Feb 23, 2023

A couple of things:

  • For Array I guess we can just make a nested call to lit for each element. No need to get CatalystTypeConverters involved.
  • Bonus points if you check if all elements are the same.
  • We do have to check what CatalystTypeConverters currently supports though. I think Map/Seq/Product support is also in there.
  • You may want to put this in a separate file.

@hvanhovell
Copy link
Contributor

Oh and if it becomes too large I am fine with merging this first, and doing array in a follow-up.

@LuciferYang
Copy link
Contributor Author

hmm... If I understand correctly, the current Literal does not support any collection type? Do we need to add some message types to support them?

message Literal {
oneof literal_type {
DataType null = 1;
bytes binary = 2;
bool boolean = 3;
int32 byte = 4;
int32 short = 5;
int32 integer = 6;
int64 long = 7;
float float = 10;
double double = 11;
Decimal decimal = 12;
string string = 13;
// Date in units of days since the UNIX epoch.
int32 date = 16;
// Timestamp in units of microseconds since the UNIX epoch.
int64 timestamp = 17;
// Timestamp in units of microseconds since the UNIX epoch (without timezone information).
int64 timestamp_ntz = 18;
CalendarInterval calendar_interval = 19;
int32 year_month_interval = 20;
int64 day_time_interval = 21;
}

@hvanhovell
Copy link
Contributor

lol, no it does not. Let's just implement what we support, and do the rest in a different PR.

@LuciferYang
Copy link
Contributor Author

OK

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Feb 23, 2023

Oh and if it becomes too large I am fine with merging this first, and doing array in a follow-up.

I hope we can merge this pr first if no other need to change. In addition, I need to go to bed as soon as possible. It's 1:00 in my time zone :)

@hvanhovell
Copy link
Contributor

go to sleep!

@LuciferYang
Copy link
Contributor Author

@hvanhovell Is there anything else can help Scala Client? @panbingkun told me that he also wanted to take some work related to connect.

@hvanhovell
Copy link
Contributor

@LuciferYang @panbingkun that would be great! I will create an epic, with a bunch of todo's.

@LuciferYang
Copy link
Contributor Author

Is there anything else need change this pr?

@hvanhovell
Copy link
Contributor

@LuciferYang can you update your PR?

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hvanhovell
Copy link
Contributor

@LuciferYang @panbingkun I created an epic with a bunch of things you can pick up: https://issues.apache.org/jira/browse/SPARK-42554

@amaliujia
Copy link
Contributor

LGTM but please rebase this PR to solve conflict.

@hvanhovell
Copy link
Contributor

Merging to master/3.4. Thanks!

hvanhovell pushed a commit that referenced this pull request Feb 25, 2023
… types

### What changes were proposed in this pull request?
This pr aims add more types support of `sql.functions#lit` function, include:

- Decimal
- Instant
- Timestamp
- LocalDateTime
- Date
- Duration
- Period
- CalendarInterval

### Why are the changes needed?
Make ·sql.functions#lit· function support more types

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Add new test
- Manual checked new case with Scala-2.13

Closes #40143 from LuciferYang/functions-lit.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Herman van Hovell <herman@databricks.com>
(cherry picked from commit 2a4aab7)
Signed-off-by: Herman van Hovell <herman@databricks.com>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
… types

### What changes were proposed in this pull request?
This pr aims add more types support of `sql.functions#lit` function, include:

- Decimal
- Instant
- Timestamp
- LocalDateTime
- Date
- Duration
- Period
- CalendarInterval

### Why are the changes needed?
Make ·sql.functions#lit· function support more types

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Add new test
- Manual checked new case with Scala-2.13

Closes apache#40143 from LuciferYang/functions-lit.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Herman van Hovell <herman@databricks.com>
(cherry picked from commit 2a4aab7)
Signed-off-by: Herman van Hovell <herman@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants